Skip to content

workflow: optimize external link checks (#22894)#22896

Merged
ti-chi-bot[bot] merged 2 commits into
pingcap:release-8.5from
ti-chi-bot:cherry-pick-22894-to-release-8.5
May 15, 2026
Merged

workflow: optimize external link checks (#22894)#22896
ti-chi-bot[bot] merged 2 commits into
pingcap:release-8.5from
ti-chi-bot:cherry-pick-22894-to-release-8.5

Conversation

@ti-chi-bot
Copy link
Copy Markdown
Member

This is an automated cherry-pick of #22894

What is changed, added or deleted? (Required)

This PR optimizes the lychee link-check workflows as follows:

  • Updates the weekly full-repository link check to focus on external URLs and exclude file:// internal links.
  • Adds shared scripts to extract site href URLs and changed Markdown lines with link candidates.
  • Converts non-HTTP href values such as href="/tidbcloud/tidb-cloud-quickstart" into URLs based on DOCS_SITE_BASE_URL before checking them.
  • Changes the PR link check to scan only added/modified lines that contain link candidates, instead of every changed Markdown file.
  • Keeps lychee cache for the weekly full scan, but caches only successful 2xx responses so failed links are rechecked in later runs.
  • Adds ignore rules for bot-unfriendly or auth-gated external sites reported in recent link-check issues.

Benefits:

  • Reduces false positives from docs site route links that lychee previously treated as missing local files.
  • Makes PR checks much faster for broad edits that do not add or modify links, such as deleting aliases across many files.
  • Keeps full scheduled scans reasonably fast while still rechecking previously failed links.
  • Centralizes the href extraction logic so the workflow can be reused more easily in the Chinese docs repository by changing DOCS_SITE_BASE_URL.

Which TiDB version(s) do your changes apply to? (Required)

  • master (the latest development version)
  • v9.0 (TiDB 9.0 versions)
  • v8.5 (TiDB 8.5 versions)
  • v8.1 (TiDB 8.1 versions)
  • v7.5 (TiDB 7.5 versions)
  • v7.1 (TiDB 7.1 versions)
  • v6.5 (TiDB 6.5 versions)
  • v6.1 (TiDB 6.1 versions)

What is the related PR or file link(s)?

Do your changes match any of the following descriptions?

  • Delete files
  • Change aliases
  • Need modification after applied to another branch
  • Might cause conflicts after applied to another branch

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot ti-chi-bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm needs-1-more-lgtm Indicates a PR needs 1 more LGTM. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. type/cherry-pick-for-release-8.5 This PR is cherry-picked to release-8.5 from a source PR. labels May 15, 2026
@ti-chi-bot
Copy link
Copy Markdown
Member Author

@qiancai This PR has conflicts, I have hold it.
Please resolve them or ask others to resolve them, then comment /unhold to remove the hold label.

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 15, 2026

@ti-chi-bot: ## If you want to know how to resolve it, please read the guide in TiDB Dev Guide.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces two Perl scripts, extract-changed-markdown-lines.pl and extract-site-hrefs.pl, to optimize external link checking by extracting link candidates and normalizing relative URLs. It also updates the .lycheeignore configuration. Review feedback highlights that the .lycheeignore file contains unresolved merge conflict markers and duplicate entries that need to be resolved. Additionally, both scripts currently only process HTML href attributes and absolute URLs, missing standard Markdown link syntax, which should be addressed to ensure comprehensive link coverage.

Comment thread .lycheeignore Outdated
Comment on lines 44 to 51
<<<<<<< HEAD
https://portal\.azure\.com/.*
https://.*github.*/%7B%7B%7B%20.tidb_operator_version%20%7D%7D%7D
=======
>>>>>>> 145d861113 (workflow: optimize external link checks (#22894))
https://.*github.*/%7B%7B%7B.tidb-operator-version%7D%7D%7D
https://console\.cloud\.google\.com/.*
https://portal\.azure\.com/.*
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The file contains unresolved merge conflict markers (<<<<<<<, =======, >>>>>>>) and duplicate entries. These should be removed to ensure the ignore list is correctly parsed and maintained.

https://portal\.azure\.com/.*
https://.*github.*/%7B%7B%7B%20.tidb_operator_version%20%7D%7D%7D
https://.*github.*/%7B%7B%7B.tidb-operator-version%7D%7D%7D
https://console\.cloud\.google\.com/.*

Comment thread .github/scripts/extract-changed-markdown-lines.pl
next unless defined $content;

my %seen;
while ($content =~ /\bhref\s*=\s*(["'])(.*?)\1/gi) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This script specifically extracts links from HTML href attributes but ignores standard Markdown link syntax [text](url). Since the majority of links in the documentation are likely in Markdown format, this script will miss many relative links that need to be normalized to site URLs for checking.

@qiancai
Copy link
Copy Markdown
Collaborator

qiancai commented May 15, 2026

/approve

@qiancai qiancai removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 15, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 15, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: qiancai

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added the approved label May 15, 2026
@ti-chi-bot ti-chi-bot Bot merged commit f38e496 into pingcap:release-8.5 May 15, 2026
11 of 12 checks passed
@ti-chi-bot ti-chi-bot Bot deleted the cherry-pick-22894-to-release-8.5 branch May 15, 2026 07:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm needs-1-more-lgtm Indicates a PR needs 1 more LGTM. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. type/cherry-pick-for-release-8.5 This PR is cherry-picked to release-8.5 from a source PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants